Challenges and Design Issues in Search Engine and Web Crawler

نویسندگان

  • Rahul Mahajan
  • Rajeev Bedi
چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Crawlers : Taxonomy , Issues & Challenges

with increase in the size of Web, the search engine relies on Web Crawlers to build and maintain the index of billions of pages for efficient searching. The creation and maintenance of Web indices is done by Web crawlers, the crawlers recursively traverses and downloads Web pages on behalf of search engines. The exponential growth of Web poses many challenges for crawlers.This paper makes an at...

متن کامل

Design and Implementation of a High-Performance Distributed Web Crawler

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits m...

متن کامل

Crawling the Web: Discovery and Maintenance of Large-scale Web Data

This dissertation studies the challenges and issues faced in implementing an effective Web crawler. A crawler is a program that retrieves and stores pages from the Web, commonly for a Web search engine. A crawler often has to download hundreds of millions of pages in a short period of time and has to constantly monitor and refresh the downloaded pages. In addition, the crawler should avoid putt...

متن کامل

A Framework for Bridging the Gap Between Open Source Search Tools

Building a search engine that can scale to billions of documents while satisfying the needs of the users presents serious challenges. Few successful stories have been reported so far [36]. Here, we report our experience in building YouSeer, a complete open source search engine tool that includes both an open source crawler and an open source indexer. Our approach takes other open source compone...

متن کامل

Design and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine

The Web is a context in which traditional Information Retrieval methods are challenged. Given the volume of the Web and its speed of change, the coverage of modern web search engines is relatively small. Search engines attempt to crawl the web exhaustively with crawler for new pages, and to keep track of changes made to pages visited earlier. The centralized design of crawlers introduces limita...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014